《计算机应用》唯一官方网站

• •    下一篇

基于请求与应答通信机制和局部注意力机制的多机器人强化学习路径规划方法

邓辅秦1,官桧锋1,谭朝恩2,付兰慧2,王宏民3,林天麟4,张建民3   

  1. 1. 五邑大学
    2. 五邑大学智能制造学部
    3. 广东省江门市迎宾大道99号五邑大学
    4. 香港中文大学(深圳)
  • 收稿日期:2023-02-28 修回日期:2023-05-17 发布日期:2023-08-14 出版日期:2023-08-14
  • 通讯作者: 张建民
  • 基金资助:
    动态开放环境下基于5G的异构多机器人自主协同技术;深圳市科技计划资助项目;深圳市人工智能与机器人研究院探索性研究项目;动态环境下异构多机器人自主协同技术开发

multi-robot reinforcement learning path planning method based on request-response communication mechanism and local attention mechanism

  • Received:2023-02-28 Revised:2023-05-17 Online:2023-08-14 Published:2023-08-14

摘要: 动态环境下的多机器人路径规划在多机器人领域有着较大的实用价值和学术价值。在动态环境下,为了降低多机器人路径规划的阻塞率,在深度强化学习算法Actor-Critic框架下,本文设计出一种基于请求与应答通信机制和局部注意力机制的分布式深度强化学习路径规划方法(DCAMAPF)。在Actor网络,基于请求与应答通信机制,每个机器人请求视野内的其他机器人的局部观测信息和动作信息,进而规划出协同的动作策略。在Critic网络,每个机器人基于局部注意力机制将注意力权重动态地分配到在视野内成功应答的其他机器人局部观测和动作信息上。与传统动态路径规划方法D* Lite、最新的分布式强化学习方法Mapper和最新的集中式强化学习方法AB-Mapper相比,DCAMAPF在离散初始化环境,阻塞率差值均约缩小了6.91%、4.97%、3.56%;在集中初始化环境下能更高效地避免发生阻塞,阻塞率差值均约缩小了15.86%、11.71%、5.54%,并降低占用的计算缓存。

关键词: 多机器人路径规划, 深度强化学习, 注意力机制, 通信

Abstract: Multi-agent path planning in a dynamic environment has great practical value and academic value in the field of multi-agent. In order to reduce the blocking rate of multi-agent path planning in a dynamic environment, under the framework of the deep reinforcement learning algorithm Actor-Critic, In this paper, a distributed multi-agent path planning method based on request and response communication mechanism and local attention mechanism is designed, named Distributed Communication and local Attention Based Multi-agent Path Finding(DCAMAPF). In the Actor network, based on the request and response communication mechanism, each agent requests the local observation information and action information of other agents in the field of view, and then plans a coordinated action policy. In the Critic network, each agent dynamically assigns attention weights to the local observation and action information of other agents that successfully respond within the field of view based on the local attention mechanism. Compared with the traditional dynamic path planning method D* Lite, the latest distributed reinforcement learning method Mapper, and the latest centralized reinforcement learning method AB-Mapper, in the discrete initialization environment of DCAMAPF, the difference in blocking rate was reduced by about 6.91%, 4.97%, and 3.56%. In the centralized initialization environment, blocking can be avoided more efficiently, and the difference in blocking rate is reduced by about 15.86%, 11.71%, and 5.54%, and less computing cache is occupied.

Key words: Keywords: Multi-agent path planning, deep reinforcement learning, attention mechanism, communication